Problem Description

The nyc-weather-13.csv file available from http://bit.ly/nyc-weather-13 contains hourly meteorological data from 2013 for each of the three New York City airports:

  • EWR - Newark Liberty International Airport
  • JFK - John F. Kennedy International Airport
  • LGA - LaGuardia Airport

Get the data

We will use the read_csv function in the readr package available in the meta-package called tidyverse to read in this data from the web. We give this data frame the name weather.

#install.packages("tidyverse")
#install.packages("knitr")
#install.packages("rmarkdown")
#install.packages("plotly")
library(tidyverse)
library(knitr)
library(rmarkdown)
library(plotly)
weather <- read_csv("http://bit.ly/nyc-weather-13")

Problem 1

Produce a plot exploring the relationship between month and temp.

Solution: month and temp are both quantitative variables, so we may start by looking at a scatterplot:

ggplot(data = weather, 
  mapping = aes(x = month, y = temp)) +
  geom_point()
Warning: Removed 1 rows containing missing values (geom_point).

This gives us a rough layout of the data. To better understand the variability, a (side-by-side) boxplot is preferred.

ggplot(data = weather, 
  mapping = aes(x = month, y = temp)) +
  geom_boxplot()
Warning: Continuous x aesthetic -- did you forget aes(group=...)?
Warning: Removed 1 rows containing non-finite values (stat_boxplot).

This isn’t the plot that we want and the first warning message provides some guidance as to how to proceed with a continuous x aesthetic. We also see that one value is missing from the data with the other warning.

ggplot(data = weather, 
  mapping = aes(x = month, group = month, y = temp)) +
  geom_boxplot()
Warning: Removed 1 rows containing non-finite values (stat_boxplot).

This isn’t quite the plot we want either since the x axis is on a continuous scale but month is discrete. We also can use the warning=FALSE chunk option to omit the warning about missing values.

plot1 <- ggplot(data = weather, 
  mapping = aes(x = month, group = month, y = temp)) +
  geom_boxplot() +
  scale_x_continuous(breaks = 1:12)
plot1

Problem 2

Calculate the minimum temperature recorded for each month across all three airports.

min_month_temp <- weather %>%
  group_by(month) %>%
  summarize(min_temp = min(temp))
min_month_temp
# A tibble: 12 x 2
   month min_temp
   <int>    <dbl>
 1     1    10.94
 2     2    15.98
 3     3    26.06
 4     4    30.92
 5     5    13.10
 6     6    53.96
 7     7    64.04
 8     8       NA
 9     9    48.02
10    10    33.08
11    11    21.02
12    12    17.96

If we don’t like the “raw” output that is produced by default with a table, we can pass the data frame into the kable function in the knitr package or the paged_table function in the rmarkdown package to get nicer output:

kable(min_month_temp)
month min_temp
1 10.94
2 15.98
3 26.06
4 30.92
5 13.10
6 53.96
7 64.04
8 NA
9 48.02
10 33.08
11 21.02
12 17.96

This shows that the minimum temperature for August is missing. This is due to the fact that there is a missing temperature in the data. If you look into ?min, you can see that one of the arguments to the function is na.rm which is set to FALSE by default. We will set it to TRUE now:

min_month_temp <- weather %>%
  group_by(month) %>%
  summarize(min_temp = min(temp, na.rm = TRUE))
paged_table(min_month_temp)

Problem 3

Produce a plot showing how minimum temperature varies across the 12 months.

ggplot(data = min_month_temp,
  mapping = aes(x = month, y = min_temp)) +
  geom_point()

Problem 4

Calculate the minimum temperature recorded for each month FOR EACH OF the three airports.

min_month_temp2 <- weather %>%
  group_by(month, origin) %>%
  summarize(min_temp = min(temp, na.rm = TRUE))
paged_table(min_month_temp2)

Problem 5

Explore the multivariate relationship between month, airport, and minimum temperature via a statistical graphic.

plot5 <- ggplot(data = min_month_temp2,
  mapping = aes(x = month, y = min_temp, color = origin)) +
  geom_line() +
  geom_point()

Showing off

We can easily turn any of the plots above into interactive graphics using the plotly package and its ggplotly function. Hover over the plots!

ggplotly(plot1)
ggplotly(plot5, tooltip = c("x", "y", "color"))

Discussion: In general, we see that the winter and fall months have the most variability with the summer having the least. This makes sense about New York City having both some very cold winter days and some hot summer days with a range of values throughout. As noted in the boxplot in Problem 1, there is a strange outlier in May showing a minimum temperature of 13.1.

LS0tCnRpdGxlOiAiUHJlLUJvb3RjYW1wIEhXIEFuc3dlcnMiCmF1dGhvcjogIkNoZXN0ZXIgSXNtYXkiCmRhdGU6ICI1LzI2LzIwMTciCm91dHB1dDogCiAgaHRtbF9kb2N1bWVudDoKICAgIHRvYzogdHJ1ZQogICAgdG9jX2RlcHRoOiAyCiAgICB0b2NfZmxvYXQ6IHRydWUKICAgIGNvZGVfZm9sZGluZzogaGlkZQogICAgY29kZV9kb3dubG9hZDogdHJ1ZQotLS0KCmBgYHtyIGRvYy1kZWZhdWx0cywgaW5jbHVkZT1GQUxTRX0Ka25pdHI6Om9wdHNfY2h1bmskc2V0KGNvbW1lbnQ9TkEpCmBgYAoKCiMjIFByb2JsZW0gRGVzY3JpcHRpb24KClRoZSBgbnljLXdlYXRoZXItMTMuY3N2YCBmaWxlIGF2YWlsYWJsZSBmcm9tIFtodHRwOi8vYml0Lmx5L255Yy13ZWF0aGVyLTEzXShodHRwczovL2lzbWF5Yy5naXRodWIuaW8vcG9SdGxhbmQtYm9vdGNhbXAxNy9ueWMtd2VhdGhlci0xMy5jc3YpIGNvbnRhaW5zIGhvdXJseSBtZXRlb3JvbG9naWNhbCBkYXRhIGZyb20gMjAxMyBmb3IgZWFjaCBvZiB0aGUgdGhyZWUgTmV3IFlvcmsgQ2l0eSBhaXJwb3J0czoKCiAgLSBgRVdSYCAtIE5ld2FyayBMaWJlcnR5IEludGVybmF0aW9uYWwgQWlycG9ydAogIC0gYEpGS2AgLSBKb2huIEYuIEtlbm5lZHkgSW50ZXJuYXRpb25hbCBBaXJwb3J0CiAgLSBgTEdBYCAtIExhR3VhcmRpYSBBaXJwb3J0CgoqKioKCiMjIyBHZXQgdGhlIGRhdGEKCldlIHdpbGwgdXNlIHRoZSBgcmVhZF9jc3ZgIGZ1bmN0aW9uIGluIHRoZSBgcmVhZHJgIHBhY2thZ2UgYXZhaWxhYmxlIGluIHRoZSBtZXRhLXBhY2thZ2UgY2FsbGVkIGB0aWR5dmVyc2VgIHRvIHJlYWQgaW4gdGhpcyBkYXRhIGZyb20gdGhlIHdlYi4gIFdlIGdpdmUgdGhpcyBkYXRhIGZyYW1lIHRoZSBuYW1lIGB3ZWF0aGVyYC4KCmBgYHtyIGxvYWQsIG1lc3NhZ2U9RkFMU0V9CiNpbnN0YWxsLnBhY2thZ2VzKCJ0aWR5dmVyc2UiKQojaW5zdGFsbC5wYWNrYWdlcygia25pdHIiKQojaW5zdGFsbC5wYWNrYWdlcygicm1hcmtkb3duIikKI2luc3RhbGwucGFja2FnZXMoInBsb3RseSIpCmxpYnJhcnkodGlkeXZlcnNlKQpsaWJyYXJ5KGtuaXRyKQpsaWJyYXJ5KHJtYXJrZG93bikKbGlicmFyeShwbG90bHkpCndlYXRoZXIgPC0gcmVhZF9jc3YoImh0dHA6Ly9iaXQubHkvbnljLXdlYXRoZXItMTMiKQpgYGAKCgojIyMgUHJvYmxlbSAxCgpQcm9kdWNlIGEgcGxvdCBleHBsb3JpbmcgdGhlIHJlbGF0aW9uc2hpcCBiZXR3ZWVuIGBtb250aGAgYW5kIGB0ZW1wYC4KCioqU29sdXRpb24qKjogIGBtb250aGAgYW5kIGB0ZW1wYCBhcmUgYm90aCBxdWFudGl0YXRpdmUgdmFyaWFibGVzLCBzbyB3ZSBtYXkgc3RhcnQgYnkgbG9va2luZyBhdCBhIHNjYXR0ZXJwbG90OgoKYGBge3Igc2NhdDF9CmdncGxvdChkYXRhID0gd2VhdGhlciwgCiAgbWFwcGluZyA9IGFlcyh4ID0gbW9udGgsIHkgPSB0ZW1wKSkgKwogIGdlb21fcG9pbnQoKQpgYGAKClRoaXMgZ2l2ZXMgdXMgYSByb3VnaCBsYXlvdXQgb2YgdGhlIGRhdGEuICBUbyBiZXR0ZXIgdW5kZXJzdGFuZCB0aGUgdmFyaWFiaWxpdHksIGEgKHNpZGUtYnktc2lkZSkgYm94cGxvdCBpcyBwcmVmZXJyZWQuICAKCmBgYHtyfQpnZ3Bsb3QoZGF0YSA9IHdlYXRoZXIsIAogIG1hcHBpbmcgPSBhZXMoeCA9IG1vbnRoLCB5ID0gdGVtcCkpICsKICBnZW9tX2JveHBsb3QoKQpgYGAKClRoaXMgaXNuJ3QgdGhlIHBsb3QgdGhhdCB3ZSB3YW50IGFuZCB0aGUgZmlyc3Qgd2FybmluZyBtZXNzYWdlIHByb3ZpZGVzIHNvbWUgZ3VpZGFuY2UgYXMgdG8gaG93IHRvIHByb2NlZWQgd2l0aCBhIGNvbnRpbnVvdXMgYHhgIGFlc3RoZXRpYy4gIFdlIGFsc28gc2VlIHRoYXQgb25lIHZhbHVlIGlzIG1pc3NpbmcgZnJvbSB0aGUgZGF0YSB3aXRoIHRoZSBvdGhlciB3YXJuaW5nLgoKYGBge3J9CmdncGxvdChkYXRhID0gd2VhdGhlciwgCiAgbWFwcGluZyA9IGFlcyh4ID0gbW9udGgsIGdyb3VwID0gbW9udGgsIHkgPSB0ZW1wKSkgKwogIGdlb21fYm94cGxvdCgpCmBgYAoKVGhpcyBpc24ndCBxdWl0ZSB0aGUgcGxvdCB3ZSB3YW50IGVpdGhlciBzaW5jZSB0aGUgYHhgIGF4aXMgaXMgb24gYSBjb250aW51b3VzIHNjYWxlIGJ1dCBgbW9udGhgIGlzIGRpc2NyZXRlLiAgV2UgYWxzbyBjYW4gdXNlIHRoZSBgd2FybmluZz1GQUxTRWAgY2h1bmsgb3B0aW9uIHRvIG9taXQgdGhlIHdhcm5pbmcgYWJvdXQgbWlzc2luZyB2YWx1ZXMuCgpgYGB7ciB3YXJuaW5nPUZBTFNFfQpwbG90MSA8LSBnZ3Bsb3QoZGF0YSA9IHdlYXRoZXIsIAogIG1hcHBpbmcgPSBhZXMoeCA9IG1vbnRoLCBncm91cCA9IG1vbnRoLCB5ID0gdGVtcCkpICsKICBnZW9tX2JveHBsb3QoKSArCiAgc2NhbGVfeF9jb250aW51b3VzKGJyZWFrcyA9IDE6MTIpCnBsb3QxCmBgYAoKCiMjIFByb2JsZW0gMgoKQ2FsY3VsYXRlIHRoZSBtaW5pbXVtIHRlbXBlcmF0dXJlIHJlY29yZGVkIGZvciBlYWNoIG1vbnRoIGFjcm9zcyBhbGwgdGhyZWUgYWlycG9ydHMuCgpgYGB7cn0KbWluX21vbnRoX3RlbXAgPC0gd2VhdGhlciAlPiUKICBncm91cF9ieShtb250aCkgJT4lCiAgc3VtbWFyaXplKG1pbl90ZW1wID0gbWluKHRlbXApKQptaW5fbW9udGhfdGVtcApgYGAKCklmIHdlIGRvbid0IGxpa2UgdGhlICJyYXciIG91dHB1dCB0aGF0IGlzIHByb2R1Y2VkIGJ5IGRlZmF1bHQgd2l0aCBhIHRhYmxlLCB3ZSBjYW4gcGFzcyB0aGUgZGF0YSBmcmFtZSBpbnRvIHRoZSBga2FibGVgIGZ1bmN0aW9uIGluIHRoZSBga25pdHJgIHBhY2thZ2Ugb3IgdGhlIGBwYWdlZF90YWJsZWAgZnVuY3Rpb24gaW4gdGhlIGBybWFya2Rvd25gIHBhY2thZ2UgdG8gZ2V0IG5pY2VyIG91dHB1dDoKCmBgYHtyfQprYWJsZShtaW5fbW9udGhfdGVtcCkKYGBgCgoKVGhpcyBzaG93cyB0aGF0IHRoZSBtaW5pbXVtIHRlbXBlcmF0dXJlIGZvciBBdWd1c3QgaXMgbWlzc2luZy4gIFRoaXMgaXMgZHVlIHRvIHRoZSBmYWN0IHRoYXQgdGhlcmUgaXMgYSBtaXNzaW5nIHRlbXBlcmF0dXJlIGluIHRoZSBkYXRhLiAgSWYgeW91IGxvb2sgaW50byBgP21pbmAsIHlvdSBjYW4gc2VlIHRoYXQgb25lIG9mIHRoZSBhcmd1bWVudHMgdG8gdGhlIGZ1bmN0aW9uIGlzIGBuYS5ybWAgd2hpY2ggaXMgc2V0IHRvIGBGQUxTRWAgYnkgZGVmYXVsdC4gIFdlIHdpbGwgc2V0IGl0IHRvIGBUUlVFYCBub3c6CgpgYGB7cn0KbWluX21vbnRoX3RlbXAgPC0gd2VhdGhlciAlPiUKICBncm91cF9ieShtb250aCkgJT4lCiAgc3VtbWFyaXplKG1pbl90ZW1wID0gbWluKHRlbXAsIG5hLnJtID0gVFJVRSkpCnBhZ2VkX3RhYmxlKG1pbl9tb250aF90ZW1wKQpgYGAKCgojIyBQcm9ibGVtIDMKClByb2R1Y2UgYSBwbG90IHNob3dpbmcgaG93IG1pbmltdW0gdGVtcGVyYXR1cmUgdmFyaWVzIGFjcm9zcyB0aGUgMTIgbW9udGhzLgoKYGBge3J9CmdncGxvdChkYXRhID0gbWluX21vbnRoX3RlbXAsCiAgbWFwcGluZyA9IGFlcyh4ID0gbW9udGgsIHkgPSBtaW5fdGVtcCkpICsKICBnZW9tX3BvaW50KCkKYGBgCgojIyBQcm9ibGVtIDQKCkNhbGN1bGF0ZSB0aGUgbWluaW11bSB0ZW1wZXJhdHVyZSByZWNvcmRlZCBmb3IgZWFjaCBtb250aCBGT1IgRUFDSCBPRiB0aGUgdGhyZWUgYWlycG9ydHMuCgpgYGB7cn0KbWluX21vbnRoX3RlbXAyIDwtIHdlYXRoZXIgJT4lCiAgZ3JvdXBfYnkobW9udGgsIG9yaWdpbikgJT4lCiAgc3VtbWFyaXplKG1pbl90ZW1wID0gbWluKHRlbXAsIG5hLnJtID0gVFJVRSkpCnBhZ2VkX3RhYmxlKG1pbl9tb250aF90ZW1wMikKYGBgCgojIyBQcm9ibGVtIDUKCkV4cGxvcmUgdGhlIG11bHRpdmFyaWF0ZSByZWxhdGlvbnNoaXAgYmV0d2VlbiBtb250aCwgYWlycG9ydCwgYW5kIG1pbmltdW0gdGVtcGVyYXR1cmUgdmlhIGEgc3RhdGlzdGljYWwgZ3JhcGhpYy4KCmBgYHtyfQpwbG90NSA8LSBnZ3Bsb3QoZGF0YSA9IG1pbl9tb250aF90ZW1wMiwKICBtYXBwaW5nID0gYWVzKHggPSBtb250aCwgeSA9IG1pbl90ZW1wLCBjb2xvciA9IG9yaWdpbikpICsKICBnZW9tX2xpbmUoKSArCiAgZ2VvbV9wb2ludCgpCmBgYAoKIyMgU2hvd2luZyBvZmYKCldlIGNhbiBlYXNpbHkgdHVybiBhbnkgb2YgdGhlIHBsb3RzIGFib3ZlIGludG8gaW50ZXJhY3RpdmUgZ3JhcGhpY3MgdXNpbmcgdGhlIGBwbG90bHlgIHBhY2thZ2UgYW5kIGl0cyBgZ2dwbG90bHlgIGZ1bmN0aW9uLiAgSG92ZXIgb3ZlciB0aGUgcGxvdHMhCgpgYGB7ciB3YXJuaW5nPUZBTFNFLCBmaWcuaGVpZ2h0PTV9CmdncGxvdGx5KHBsb3QxKQpnZ3Bsb3RseShwbG90NSwgdG9vbHRpcCA9IGMoIngiLCAieSIsICJjb2xvciIpKQpgYGAKCgoqKkRpc2N1c3Npb24qKjogIEluIGdlbmVyYWwsIHdlIHNlZSB0aGF0IHRoZSB3aW50ZXIgYW5kIGZhbGwgbW9udGhzIGhhdmUgdGhlIG1vc3QgdmFyaWFiaWxpdHkgd2l0aCB0aGUgc3VtbWVyIGhhdmluZyB0aGUgbGVhc3QuICBUaGlzIG1ha2VzIHNlbnNlIGFib3V0IE5ldyBZb3JrIENpdHkgaGF2aW5nIGJvdGggc29tZSB2ZXJ5IGNvbGQgd2ludGVyIGRheXMgYW5kIHNvbWUgaG90IHN1bW1lciBkYXlzIHdpdGggYSByYW5nZSBvZiB2YWx1ZXMgdGhyb3VnaG91dC4gYHIgaWZlbHNlKG1pbl9tb250aF90ZW1wJG1pbl90ZW1wWzVdIDwgNDAsIHBhc3RlMCgiQXMgbm90ZWQgaW4gdGhlIGJveHBsb3QgaW4gUHJvYmxlbSAxLCB0aGVyZSBpcyBhIHN0cmFuZ2Ugb3V0bGllciBpbiBNYXkgc2hvd2luZyBhIG1pbmltdW0gdGVtcGVyYXR1cmUgb2YgIiwgbWluX21vbnRoX3RlbXAkbWluX3RlbXBbNV0sICIuIiksICIiKWAgIAo=